Bai S, Kolter J Z, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling[J]. arXiv preprint arXiv:1803.01271, 2018.
1. Overview
In this paper, it demonstrated that simple CNN (TCN) outperforms RNN, such as LSTM
- TCN. longer memory, more accurate, simper
- RNN. not paralleled
1.1. Background
1.1.1. Application
- part-of-speech tagging and semantic role labelling
- sentence classification
- document classification
- machine translation
- audio synthesis
- language modeling
1.1.2. Architecture
- LSTM
- GRU
- ConvLSTM
- Quasi-RNN
- dilated RNN
2. TCN
2.1. Architecture
- input. sequence of any length
- output sequence of the same length
- no gating mechanism
- longer memory
- 1D Conv with padding
- receptive fileds
- dilation factor d
- kernel size k
- network depth n
- d=2^n to make sure to hit each input within the effective history
2.2. Advantage
- parallelism
- flexible receptive filed size
- stable gradient
- low memory requirement for training
- variable length inputs
2.3. Disadvantage
- data storage during evaluation
- potential parameter change for a transfer of domain
3. Experiments
3.1. Details
- gradient clipping helped convergence [0.3, 1]
- find that TCN insensitive to hyperparameter changes, as long as the effective history size is sufficient